Ontology Development for the Masses: Creating ICD-11 in WebProtégé

نویسندگان

  • Tania Tudorache
  • Sean M. Falconer
  • Natalya Fridman Noy
  • Csongor Nyulas
  • Tevfik Bedirhan Üstün
  • Margaret-Anne D. Storey
  • Mark A. Musen
چکیده

The World Health Organization is currently developing the 11th revision of the International Classification of Diseases (ICD-11). ICD is the standard diagnostic classification used in health care all over the world. In contrast to previous ICD revisions that did not have a formal representation and were mainly available as printed books, ICD-11 uses OWL for the formal representation of its content. In this paper, we report on our work to support the collaborative development of ICD-11 in WebProtégé—a web-based ontology browser and editor. WebProtégé integrates collaboration features directly into the editing process. We report on the results of the evaluation that we performed during a two-week meeting with the ICD editors in Geneva. We performed the evaluation in the context of the editors learning to use WebProtégé to start the ICD-11 development. Participants in the evaluation were optimistic that collaborative development will work in this context, but have raised a number of critical issues. 1 Creating a Formal Representation of ICD-11 Ontologies and terminologies are a critical component of many knowledge-intensive systems. In recent years, we have seen a considerable growth both in the tools that support the development of ontologies collaboratively and the projects that include contribution by a community of experts as a critical part of their workflow. The development of large biomedical terminologies and ontologies is possible only in a collaborative setting. The Gene Ontology (GO) is one of the more prominent examples of an ontology that is a product of a collaborative process [3]. GO provides terminology for consistent description of gene products in different model-organism databases. Members of the GO community constantly suggest new terms for this ontology and several full-time curators review the suggestions and incorporate them into GO. The National Cancer Institute’s Thesaurus (NCI Thesaurus) is another example of a large biomedical ontology that is being developed collaboratively [4]. The Biomed Grid Terminology (BiomedGT) restructures the NCI Thesaurus to facilitate terminology federation and open content development. NCI is using a wiki environment to solicit the feedback about the terminology from the community at large. The Ontology for Biomedical Investigations (OBI), a product of the OBI Consortium, is a federated ontology, which has more than 40 active curators, each responsible for a particular scientific community (e.g., cellular assay, clinical investigations, immunology, etc.). Developers of these ontologies use a variety of tools and a broad range of editorial workflows to achieve consensus and to ensure quality [11]. The International Classification of Diseases (ICD) is the standard diagnostic classification developed by the World Health Organization (WHO) to encode information relevant for epidemiology, health management, and clinical use. Health officials use ICD in all United Nations member countries to compile basic health statistics, to monitor health-related spending, and to inform policy makers. In the United States, use of the ICD is also a requirement for all medical billing. Thus, ICD is an essential resource for health care all over the world. The ICD traces its formal origins to the 19th Century, and the classification has undergone revisions at regular intervals since then. The current revision of ICD, ICD-10, contains more than 20,000 terms. In 2007, WHO initiated the work on the 11th revision of ICD (ICD-11) with the mission “to produce an international disease classification that is ready for electronic health records that will serve as a standard for scientific comparability and communication.”4 ICD-11 will introduce major changes to ICD, which the WHO characterizes as (1) evolving from a focus on mortality and morbidity to a multi-purpose and coherent classification that can capture other uses, such as primary care and public health; (2) creating a multilingual international reference standard for scientific comparability and communication purposes; (3) ensuring that ICD-11 can function in electronic health records (EHRs) by linking ICD to other terminologies and ontologies used in EHRs, such as SNOMED CT; (4) introducing logical structure and definitions in the description of entities and representing ICD-11 in OWL and SKOS. In addition to these changes in structure and content, the WHO is also radically changing the revision process itself. Whereas the previous revisions were performed by relatively small groups of experts in face-to-face meetings and published only in English and in large tomes, development of ICD-11 will require a Web-based process with thousands of experts contributing to, evaluating, and reviewing the evolving content online. We have developed a custom tailored version of WebProtégé, called iCAT, for authoring the alpha draft of ICD-11 (Section 2).5 WebProtégé is a Protégé client that supports collaboration and enables distributed users to edit an ontology simultaneously, and to use their Web browsers for editing. The application presents users with simple forms that reflect the fields in the ICD-11 content model. The tool also incorporates many collaborative features, such as the ability to comment on ontology entities. In September 2009, WHO gathered its ICD-11 managing editors for iCamp—a two week meeting with the goal of introducing the editors to the new development process and to the customized WebProtégé tool, developing requirements for further tool support, and evaluating the open development process. In this paper, we report results from an evaluation performed during iCamp, where we focused on the feasibility of an open process for ontology development and the requirements for such a process. To the best of our knowledge, the development of ICD-11 is the largest open collaborative ontology-development experiment of its kind. Thus, we believe that the insights 4 http://sites.google.com/site/icd11revision/home 5 A demo version is available at http://icatdemo.stanford.edu that we gained from our evaluation will be informative to the organizers and developers of similar projects. Specifically, this paper makes the following contributions: – We describe the customized WebProtégé system that is being used in the collaborative development of ICD-11. – We use WebProtégé as the context for an evaluation of feasibility and requirements of a collaborative ontology-development process. 2 WebProtégé and the ICD-11 Customization Our goal in developing a customized version of WebProtégé is to support the collaborative development of the ICD-11 content. In this section, we give an overview of the main artifact that we are building—the ICD Ontology (Section 2.1) and describe the WebProtégé architecture (Section 2.2). We highlight the key elements of the user interface in iCAT, the custom-tailored version of WebProtégé, in Section 2.3. We joined the ICD revision project in its infancy, when many fundamental issues (content model, representation, workflow) and requirements for the tooling were undefined. Thus, we had to build tools that we can adapt on the fly when changes are made to the underlying model, user-interface requirements and the workflow. In Section 2.4, we describe our design of WebProtégé as a pluggable and extensible platform to enable each project to customize it according to its own requirements. iCAT is in fact a particular configuration of WebProtégé. Finally, we present the support for collaboration among a large number of distributed users as an integral requirement of the ICD revision process. We discuss the collaboration features of WebProtégé in Section 2.5. 2.1 The ICD Ontology The previous revisions of ICD stored only limited information about a disease, such as the code, title, synonyms, example terms, and simple conditions. The goal of the 11th revision process is to extend the description of diseases to include other attributes: a textual definition of the disease, clinical descriptions (body system, signs and symptoms, severity), causal mechanisms and risk factors, and the functional impact of a disease. To support the richer representation of diseases, the WHO has defined a formal representation of the model in OWL, the ICD Content Model. The content model describes both the attributes of a disease (e.g., Definition, Body System, Severity, Functional Impact, and so on) and the links to external terminologies, mainly to SNOMED CT [14]. The ICD Ontology6 is the formal representation of the ICD content model in OWL (Figure 1). The class ICDCategory is the top level class of the ICD disease hierarchy. The ontology uses a meta-model layer to describe the attributes that a disease class may or should have. For example, the class representing Acute Myocardial Infarction disease has as a type (among others) the ClinicalDescriptionSection metaclass that prescribes that the range for the property bodySystem should be the class BodySystemValueSet. In this example, the class Acute Myocardial Infarction has the CirculatorySystem as a value for the property bodySystem. All property values describing diseases are reified—they are instances of the class Term. For each value, we use this reification to record the source of the value (e.g., for 6 Accessible at http://icatdemo.stanford.edu/icd_cm/ DefinitionSection Meta-model (Information Model) ClinicalDescription Section DiagnosticCriteria Section ... Term ICDCategory ValueSet DomainConcept LinguisticTerm BodySystem ValueSet FunctionalIm pactValueSet ... ... has type subclass of ReferenceTerm Fig. 1. A snippet of the ICD Ontology. The ICDCategory is the top-level class in the ICD disease hierarchy and has as types the metaclasses from the meta-model (gray background). The property values of a disease class are instances of the class Term. The ValueSet has as subclasses the different value set hierarchies used in the ontology. a definition of a disease we need to record the supporting evidence in the form of citations or references) and other salient information. We use LinguisticTerms to represent property values that have different labels in different languages. ICD aims to become a multi-language classification, providing support for multi-linguality is paramount. Property values that are instances of the class ReferenceTerm represent links to other terms in external terminologies, such as SNOMED CT. For example, a disease has an associated body part. Rather than defining its own anatomy hierarchy to serve as values for the bodyPart property of a disease, ICD-11 references classes in SNOMED CT that represent anatomical parts. Since it is not practical to import the entire SNOMED CT into ICD-11, the ReferenceTerm class models all the information needed to identify uniquely an entity in an external terminology: the fully qualified name of the external entity, the name of the ontology, the label of the term, and other auxiliary information. This construct allows us to import references to terms in external terminologies and ontologies in a uniform and practical way. 2.2 Architecture of WebProtégé Figure 2 shows a high level WebProtégé architecture diagram and the interaction of the software components. The core functionality of the application is supported by the Protégé server, which provides access to the ontology content, such as retrieving and changing classes, properties and individuals in the ontology. The ontologies that the server accesses are stored in a database on the server side. To facilitate the management and reuse of the ICD ontology, we modularized it into several smaller ontologies that import each other. Both the Web-based Protégé client (WebProtégé) and the “traditional” Protégé desktop client access the Protégé server to present the ontologies to the users. Any number of clients of either type can access and edit the same ontology on the server simultaneously. All changes that a user makes in one of the clients are immediately visible in all other clients. The ICD editors use the WebProtégé client to browse and edit ICD-11. The technical-support team often uses the desktop client to make corrections or perform operations that are not supported in the Web interface. Fig. 2. An architecture diagram of the customized WebProtégé for ICD. The ICD ontology content is accessible through both a Protégé desktop client and in a Web browser. WebProtégé accesses BioPortal for searching terms to import as external references. Both WebProtégé and the Protégé desktop clients connect to a Protégé server to read and write the ontology content and information that supports the collaboration features. In order to search external biomedical terminologies and to import terms from these terminologies, WebProtégé accesses BioPortal, a repository of about 200 biomedical ontologies and terminologies [9]. BioPortal provides REST service access that enables search across different ontologies and access to information about specific terms. Support for collaboration among users is one of the key features of WebProtégé. We have developed a general-purpose collaboration framework in Protégé [15] and we use the same framework in WebProtégé. This framework provides Java APIs for tracking changes in an ontology, and for storing notes and discussion threads attached to ontology entities. We also reuse the generic access policy mechanism of the Protégé server that allows us to define customized access policies for an ontology (e.g., a user who has only read access will not be able to edit the ontology). 2.3 Features of the WebProtégé User Interface WebProtégé is a web portal, inspired by other portals, such as myYahoo or iGoogle. Our vision is to enable users to build a custom user interface by combining existing components in a form that is appropriate for their project. The user interface is composed of tabs—either predefined ones or user-defined. A new tab is an empty container in which users can add and arrange by drag-n-dropping portlets. A portlet is a user interface component that provides some functionality. For example, the Class tree portlet displays the class hierarchy in an ontology and has support for class level operations (create and delete class, move class in hierarchy, etc.). Figure 3 shows one of the tabs in the customized WebProtégé interface for ICD, known to the domain experts as iCAT. The ICD Content tab contains two portlets: the class tree portlet—showing only a branch of the ICD ontology, and a details portlet— Fig. 3. The WebProtégé user interface customized for ICD. The interface is composed of tabs. Each tab contains one or more panels, called portlets that can be arranged by drag-n-drop. The left hand-side portlet shows the disease class hierarchy of the ICD ontology. The right portlet shows the fields of the selected disease in the tree, in this case D04 Carcinoma in situ of skin. showing the property values of the class selected in the class tree in a simple form-based interface. The domain experts are familiar with this type of interface from many other applications. For each property, we use a specific widget to acquire the property values. For example, we use a text-field widget to record the values of the ICD title property (Figure 3). As we have mentioned in Section 2.1, all values of properties describing a disease are reified as instances of the Term class. We use an instance-table widget to hide this extra reification layer from the user and to present all the details about the reified instance directly in the form for the disease. The widget presents a pre-configured set of property values for the term instance as columns in the table. You can see an example of this widget for the External Definition property in Figure 3. Most attributes for diseases have values that are references to terms in external terminologies and ontologies. For example, the property bodyPart takes as values references to the Anatomy branch of SNOMED CT (see Section 1). We have developed a generic Reference Portlet that supports the simple import of an external reference with a single mouse click. The portlet uses RESTful Web services to search terms in BioPortal. For example, the bodyPart for Acute Myocardial Infarction should be a reference to “heart” from SNOMED CT. The search in BioPortal will return a list of matched terms. To decide which SNOMED CT term to import, the user may get more information about each search result either in textual form or as a graph visualization that are also retrieved via Web Service calls to BioPortal. The Reference Portlet is also configurable. We can specify in what ontology the search should be performed. We can also restrict the search to a particular ontology branch in the configuration of the portlet (e.g., Anatomy branch in SNOMED CT). 2.4 Configuring the User Interface We noted earlier that one of our key goals in designing WebProtégé was to have a tool that can be configured easily for many different settings, workflows, and types of users. Indeed, users can configure almost everything in the WebProtégé portlets, by describing the configuration in an XML file with a a predefined schema7. Building a new tool based on WebProtégé can be as simple as defining a layout configuration for existing portlets. To support this flexibility, each portlet has a property list attached to it in the XML layout file, which we can use to provide additional configuration information. For example, the class tree portlet in Figure 3 displays only the disease hierarchy of the ICD Ontology, with the ICDCategory class as the root. We defined one property topClass of the portlet that points to the ICDCategory class in the configuration file. Thus, we can reuse the class tree portlet to display different class-tree views by simply changing a property of the portlet. The declarative user interface also allows us to define custom views for different users. In WebProtégé, layout configurations can be defined per user and per project. Therefore, different users can see the same ontology rendered in different ways. One can imagine a scenario in which a user works only on a branch of an ontology, or one in which users should see only a selection of portlets. We can support these scenarios by defining different configuration files for users. We mentioned earlier that portlets provide independent pieces of functionality. Therefore, we tried to avoid creating hard-coded dependencies between portlets in order to be able to reuse them in different configurations. For example, selecting a class in the class tree portlet should trigger the display of property values in a different portlet. Rather than hard coding this dependency, we defined a generic selection-model mechanism. Each tab has a controlling portlet—the portlet that provides the selection for the other portlets in the tab. Each time the selection in the controlling portlet (e.g., the class tree portlet) changes, the other portlets are informed via a listener mechanism about the change and can update their content accordingly. XML layout configuration file specifies the controlling portlet for a tab that can be changed at runtime. 2.5 Support for Collaboration We implemented the collaboration framework on the server side (Section 2.2) and we expose it in the user interface. Distributed users can edit the same ontology simultaneously and see immediately the results of one another’s changes. Users can add notes to classes, properties, and individuals in the ontology. They can also reply to notes that were posted by others. At the time of this writing, there are more than 1,300 notes in the production version of WebProtégé for ICD. Notes may have different types, such as Comment or Explanation. When a user browses the class hierarchy, he can see the number of notes that are attached to each class, and the number of notes in the subclasses of that class. In Figure 3, the icon next to the class name indicates, for example, that the class D04 Carcinoma in situ of skin has two notes 7 XML layout configuration examples available at: http://tinyurl.com/y35qazg attached to it. The shaded icon next to it indicates that there are also two notes in the subtree rooted at this class. Knowing the number of notes in a subtree, enables users to identify quickly the branches of ontologies that have most activity and discussions, and also to find the notes that are attached somewhere deeper in the class hierarchy. Users can also attach notes to specific triples. For example, a user may want to comment on a particular definition of a disease. The user may do so by clicking on the comment icon next to a particular property value (see Figure 3). The Notes and Discussions Tab is a dedicated interface for browsing and creating notes and discussions. WHO plans to use peer review to ensure the quality of the ICD content. In the current implementation, WebProtégé supports a prototypical implementation of a reviewing mechanism in the Reviews Tab. A user with the appropriate priviledges can request a review for a particular disease class. The user may choose from a list of predefined reviewers who are specialized on the particular domain of the disease. Once the review is complete, the reviewer may log into the system and add a review to a class. Internally, we represent Reviews as a specific type of notes in WebProtégé. The WHO is still working to define the workflow of the ICD-11 revision process. We envision that WebProtégé will support this workflow in a generic and flexible way. Currently, we support only parts of the workflow. WebProtégé already has a generic access-policy mechanism, which we use to define the different user roles (TAG member, managing editor, etc.) and their access rights. The user interface enforces the access rights and we can configure it for different user roles. However, much remains to be done. The main workflow defining how the operations should flow for different user roles is still under development. We currently plan to expose the WebProtégé platform to a larger audience, which will likely have a lower level of expertise than the current users. Members of this broader community should be able to make proposals for changes. We are currently working out the details on how such a proposal mechanism should work. Once we have a well-defined workflow, we will investigate how to develop the tool to support a flexible and generic workflow mechanism.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WebProtégé: Supporting the Creation of ICD-11

WebProtégé is a highly customizable Web interface for browsing and editing ontologies, which provides support for collaboration. We have created a customized version to support the World Health Organization with the collaborative development of the 11th revision of the International Classification of Diseases (ICD-11). Our demo will present this customized version and focus on how content creat...

متن کامل

WebProtégé: A collaborative ontology editor and knowledge acquisition tool for the Web

In this paper, we present WebProtégé-a lightweight ontology editor and knowledge acquisition tool for the Web. With the wide adoption of Web 2.0 platforms and the gradual adoption of ontologies and Semantic Web technologies in the real world, we need ontology-development tools that are better suited for the novel ways of interacting, constructing and consuming knowledge. Users today take Web-ba...

متن کامل

A Distributed Ontology Editor and Knowledge Acquisition Tool for the Web

In this paper, we present WebProtégé—a lightweight ontology editor and knowledge acquisition tool for the Web. With the wide adoption of Web 2.0 platforms and the gradual adoption of ontologies and Semantic Web technologies in the real world, we need ontology-development tools that are better suited for the novel ways of interacting, constructing and consuming knowledge. Users today take Web-ba...

متن کامل

WebProtégé: a Web-based Development Environment for OWL Ontologies

We present the latest version of WebProtégé: a free, opensource Web-based tool for editing OWL ontologies. WebProtégé allows users to create, upload, share and collaboratively edit OWL ontologies. It contains various tools that are designed to support collaborative editing processes, including issue discussion, complete change tracking support and watches. Besides providing complete OWL 2 editi...

متن کامل

Simplified OWL Ontology Editing for the Web: Is WebProtégé Enough?

Ontology engineering is a task that is notorious for its difficulty. As the group that developed Protégé, the most widely used ontology editor, we are keenly aware of how difficult the users perceive this task to be. In this paper, we present the new version of WebProtégé that we designed with two main goals in mind: (1) create a tool that will be easy to use while still accounting for commonly...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010